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ABSTRACT 

Gaussian  Elimination  is  a  method  used  to  find  the  solution 
vector  x  of  the  equation  A  x  =  b  where  A  is  any  general,  non- 
singular,  square  matrix  and  b  is  a  vector.   The  process  can  be 
broken  down  into  two  independent  phases: 

(1)  The  triangular  decomposition  of  the  matrix  A,  and 

(2)  The  back  substitution  process  to  find  the  solution  x. 
This  paper  describes  the  Gaussian  Elimination  algorithm  and 

the  Illiac  IV  routines  which  perform  the  process. 


AUTHOR'S  NOTE 

Until  I/O  specifications  for  ILLIAC  IV  are  completed, 
the  ASK  programs  described  in  this  document  will  not  be  a- 
vailable  for  use.   A  revised  document,  including  listings 
and  output,  will  be  produced  at  that  time. 
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1.   GAUSSIAN  ELIMINATION 

The  Gaussian  Elimination  Process  (used  to  solve  systems  of  simultaneous 
linear  equations)  involves  reducing  the  (square)  matrix  of  coefficients  into 
an  upper  triangular  matrix  and  performing  back  substitution  to  produce  the 
solution.   The  reduction  to  an  upper  triangular  matrix  is  done  in  N-l  steps 
(where  N  is  the  order  of  the  matrix)  as  follows: 

The  system  of  equations  is  written  as 


ail  Xl  +  ai2  X2  +  al3  X3  + 


+   a.      x     =  b 

In     n  1 


a21   Xl   +  a22   X2   +  a23  X3  + 


+   a0     x     =  b 
2n     n  2 


a\    x„    +   a   .   x^   +   a   _  x^  + 
nl      1  n2     2  n3      3 


+  a*      x     =  b 

nn     n  n 


In   forming  the   upper  triangular  matrix,   the   first    derived  system  is    found 


by  multiplying  the   first   row  by   a     /a        (for  a       ^  0)    and  subtracting  the 


pro 


duct  from  the  i  l  row  (i  >  l).   This  produces 


ail  Xl  +  ai2  X2  +  ai3  X3  + 

a(l)x  +  a(l)x  + 
22   2    23   3 


(1)      (1) 
an2  X2  +  an3  X3  + 


+  a.   x 

In  n 

♦  4lJ» 

2n  n 


=  b 


=  b 


1 

(1) 
2 


(1)     .(1) 

+  a   x  =  b 

nn  n    n 


The  new  coefficients  are  given  by 

a... 

*..-—  ali 
i«J   a.. 


1J 


11 


i,j  =  2,  . .. ,  n 


^=b.  -ii  b 
i   an   1 


i  =  2, 


The  second  derived  system  is  produced  by  multiplying  the  second  row  by 
a.p/ap„  (for  a_p  ^  0)  and  subtracting  the  product  from  the  i   row  (i  >  2) 


This  results  in: 


all  Xl  +  a12  X2  +  a13  X3  + 


a(l)x     +   a(l)x     + 
a22   X2        a23     3 


a(2)x     + 
a33  X3 


a<2>xq  + 

n3     3 


+  an      x 

In     n 


♦  a'1'* 

2n     n 


♦  a<2>x 

3n     n 


=  b 


=  b 


=  b 


1 

(1) 
2 

(2) 


(2)  ,(2 

+  a       x     =  b 

nn     n         n 


The  new   coefficients    are   given  by 

ai2  (1) 

■Jo)      a2j 

a22 

(1) 
ai2  ,(1) 

TiT    b2 

a22 


(2)  (1) 

a.  .      =   a.  . 


J2)        J1) 

b.        =  b. 

i  l 


i,   J   =   3, 


i  =   3, 


,   n 


Continue  this   process    (through  n-1   iterations)   until  the   entire  upper 
triangular  matrix  has  been   formed.      At   each   step  k,   the  new  coefficients 
are   found  using  the   formulae: 


(k-l) 
a(k)   =  a(k-l)   _  flk (k-1) 

aij  aij  Q^k_1)      ^J 

*kk 

(k-1) 
,(k)       ,(k-l)  ik  .(k-1) 

bi       =  bi  "  H£l)     \ 

\k 


i,   J   =  k+1, 


i   =  k+1 ,    . . . ,   n 


The   final   upper  triangular  matrix  is 


a, ,    x^    + 


x„  + 


11   Xl   +   \2   X2  +   ^3  X3 

a(l)x     +  a(l)x     + 
a22   X2        a23      3 

(2) 
a33  X3  + 


+   a.      x 

In     n 


x     =  b 


+   a2n  Xn   =  b2 


1 
(1) 


♦   afx     =  bf 

3n     n  3 


•■   a(n-l)  .  b(n-lj 

nn     n  n 


The   diagonal  elements   in  this   system  must    all  be  non-zero.      If  a 
diagonal   element    is   found  which   is   zero,   that   row  must  be   interchanged  with 
some   subdiagonal   row  in  the   system  which   contains   a  non-zero  element    in 
that   position.      This    interchange   does   not    affect  the   solution.      If  no   such 
row  can  be   found,   then  the  matrix  is   singular,    and  no  solution  is   possible. 

The  back   substitution  phase   of  Gaussian  Elimination  computes   the 
solution  vector  x.      This    is   done  using  the   formula: 

[.(i-1)  Y  (i-1)  I 

b.  -        >        a. .  x.  1  =  n,    .. . ,   1 

j=i+l      1J  J  J 

li  ° 


2.   TRIANGULAR  DECOMPOSITION 

The   Gaussian  Elimination  algorithm  described  in  Chapter  1   is    fine   for 
matrices  which   are   solved  once   and  then   forgotten.      However,    if  the   same 
matrix  of  coefficients    is   to  be   used  with  several   right-hand  sides    (perhaps 
at    different   points   in  time),   then  it  would  be   desirable  to  be   able  to 
find  a  solution  without   re- calculating  the  upper  triangular   form  of  the 
matrix.      Since  the   same  operations   must  be  performed  on  the   right-hand  side 
as   on  the  matrix  of  coefficients,    one    cannot  merely   save  the  upper  diagonal 
form  to  use   again. 

The  solution  of  this   problem  is  to  use  the  technique  of  triangular 
decomposition.      Any  nonsingular    (square)   matrix  A  can  be   decomposed  into 
the  product   of  a  lower  triangular  matrix  L_  and  an  upper  triangular  matrix  U 
(A  =  LU)  where  U  is   the  matrix   formed  in  the   first   phase  of  Gaussian  Elimina- 
tion  and  L  contains   the  multipliers   used  in   reducing  A  to   U.      The   lower 
triangular  matrix  L  looks   like: 


-a 


21 


11 


-a 


31 


11 


-a 


hi 


11 


-a 


(1)' 
32 


22 


-a(l) 
an2 

~JTT 
a22 


(n-2) 

"a  i 

n,n-l 

n-l,n-l 


Furthermore,    since  the   diagonal   elements   of  L  are   all   ones,   they  need 
not  be   stored.      Thus,  both  L   and  U     can  be   stored  in  a  single  n  x  n  matrix, 


possibly  overwriting  A. 

Once  A  has  been   decomposed  in  this  manner,    a  solution  for   any  right-hand 
side  b   can  be   found  by  solving  the   equations. 

Ly  =  b  for  y,    and 

Ux  =  y  for  the   solution  x. 

The   solution  Ux  =  y   is   precisely  the  back   substitution   described  above   for 
Gaussian  Elimination. 

L_  and  U  are   formed  simultaneously   from  left   to   right   one   column   at    a 
time  using  the   equations: 

Column  1: 

all   =  \l 

ail  =  *il  Ull  i=  2,   ....  n 


Column  2: 

\2  =  U12 


a22  =   *21   U12  +  U22 


ai2   =   lll  \2  +   li2   U22  i  =   3,    ...,   n 

Continue   in  this    fashion  to   generate   column   r   (l   <  r   <  n)   using  the   general 
equation: 

lr  lr 

a0     =   £        u.      +   u0 
2r  21      ±r  2r 


a       =   I        u.      +£.u0  + +   I  nu  +u 

rr         rl     lr         r2     2r  r,r-l     r-l,r         rr 

a.      =  I . _    u.      +£.0un  + +   I.      u  i  =  r+1 ,    ....    n 

lr  ll     lr  i2     2r  ir     rr 


The   above  equations    compute  L_  and  U  simultaneously,    column-by-column, 
and  overwrite  A     as  the   decomposition  is   formed.      No   calculations   are  done 
on  the  right-hand  side    (b)    at   this   point. 

The   complete   separation  of  decomposition   and  the   actual  matrix  solution 
allows  the  two   phases  to  be   coded  as   separate  routines.      This    is   advantageous 
since   other  matrix  problems    (notably   inversion  of  a  matrix)    can  proceed 
smoothly  using  L_  and  U  rather  than  A. 

There   are  two  problems    associated  with  triangular  decomposition  which 
need  to  be   considered.      The   first   is   positioning  for  size.   As  mentioned  in 
the   discussion  on  Gaussian  Elimination,    all   diagonal   elements   of  U  must  be 
non-zero.      This   requires   that    a  minimal   amount   of  row  interchanging  be   done. 
The   second  problem  is   scaling.      Computers    introduce   some   inherent   error  into 
any   computation  through   round-off,   truncation,   etc.      A  matrix  having  a  large 
range  of  magnitudes    can  quickly  lose   accuracy  because  of  computer  error.      One 
way  to   reduce   error  is   to  use  pivoting  to   assure  the  largest   possible    (in 
magnitude)    divisor  on  the   diagonals.      Since   complete  pivoting  requires   exten- 
sive,  messy   coding,   partial  pivoting  is   generally  used. 

In  partial  pivoting,   the   diagonal   and  subdiagonal   elements   in   any   column 
are   scanned,    and  the  element  having  the  largest   magnitude  is   selected  to  be 
the   diagonal    (pivot)   element.      The   row  containing  that   element    is    interchanged 
with  the   row   containing  the   diagonal  position.      Computation  then  proceeds 
normally.      A  record  must  be  kept   of  all  row  interchanges  to   facilitate 
solving  Lv  =  b.      Also,  because  triangular  decomposition   is   performed  column- 
wise,  two   restrictions  must  be  observed: 

1)  Do  not    interchange   elements  of  the   row  to  the  left   of  the   column 
being  calculated. 

2)  Update   a  new  column  before   searching  for   a  pivot.      (Step  1   of  the 
sequential    algorithm  below) . 


To   further  reduce  computer  error,    all   calculations   should  he   done 
using  double   precision  arithmetic. 

The   algorithm  for   forming  the  triangular  decomposition  of  a  square 
matrix  A  on  a  sequential    (conventional)    computer   follows  below-      The 
algorithm  generates  the   r        column  of  L  and  U   ,    (P.    records  the  number  of 

J 

the   row  interchanged  with  row  j   to   achieve  partial  pivoting   in   column  j): 

1)  Place  the   double-length  equivalent   of  column  r  into  D   (a  1   x  n 
double  precision  vector).      For  j    =  1,    ...,    r-1   do  the   following: 

a)  Take  D      >   convert    it   to   single  precision  to  get   u,    ,    and 

overwrite   a.      with  the   result.      Overwrite   D_     with  D.. 
jr  Pj  j 

b)  For   i  =   j+1,    ....   n     subtract   £..    u.      from  D.    using  double 

ij   jr       i 

precision  arithmetic.   Overwrite  D.  with  the  result. 

2)  (Partial  Pivoting).   Select  the  largest  D.  ,  i  =  r,  ...,  n,  call  it 

D-.  ,  round  it  to  single  precision  to  get  u   ,  and  overwrite  a 
Pr  rr  rr 


with  u   .   Store  P  and  overwrite  El,  by  D 
rr  r  P„     j 


3)   For  i  =  r+1,  ....  n,  divide  D.  by  u   to  get  £ .   (which  gives  a 

l     rr         lr 

single  precision  result)  and  overwrite  a.   with  the  result. 

lr 
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3.   THE  SOLUTION  PHASE 

The   solution  phase  uses  the   decomposition  of  A  and  the   right-hand  side 
vectors  b    ,    ...,   bm  to   find  the   solution  vectors   x    ,    ...,    xm.      The  right- 
hand  sides  must    first  he   updated  to  put   them  in  the   same   form  as  the  U  matrix 
i.e.,    solve  the  equation  Uy  =  b   for  y.      This    is   done   as    follows    for  each 
vector  ~b' 

Put   the   double  length  equivalent   of  b.    into  D. ,    i=l ,    ...,   n. 

For  i=l,    ...,   n   do  the   following: 

1)  Round  D       to   single  precision  to  get  v.,   overwrite  D       with   D. . 

p.  i  p.  l 

2)  For  j    =  i+1,    ....   n   subtract   £..    v.    from  D.    and  overwrite  D. 

Ji      i  J  J 

using  double  precision  arithmetic. 

Now  that   the  vectors  b    ,    ••.,   bmare  in  the   same   form  as   U,   the   solutions 
can  be   found  using  the  back  substitution  method.      For   each   right-hand  side, 
do  the   following    (right-hand  sides   are   still   in  double  precision   form  in  E>) : 

For  i=n,    ...,    1  perform  the   following   calculations: 

1)  Divide   D.   by    u.jj     to   get   x.  . 

2)  For  j   =  i-1,    ...,    1   subtract   u. .    x.    from  D.. 

Ji   i       J 

This  forms  the  solution  x  to  the  equation  Ax  =  b . 


k.      GAUSSIAN  ELIMINATION  PACKAGE  ON   ILLIAC   IV 

Up  until  now,   we  have   "been  talking  about   sequential   algorithms    for 
solving  systems   of  linear  equations.      The   remainder  of  this   report 
concerns  two   subroutines  written   for  a  parallel    processor  computer 
(illiac   IV)    in  the   assembly  language   for  that   machine    (ASK).      Special 
considerations    and  problems   related  to  the  machine   are   discussed  in 
detail. 
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5.   SPECIAL  TERMINOLOGY  USED  IN  ASK  DESCRIPTION 

The  following  terms  are  used  in  the  discussion  of  the  ASK  version  of 
the  Gaussian  Elimination  algorithm: 

Matrix  -  the  m  x  m  input  matrix,  in  the  form  described  under  "use 
of  the  Gaussian  Elimination  package". 

Segment  -  an  m  x  6k   piece  of  the  matrix  (i.e.,  6k   columns). 

Tract  -  a  6k   x  m  piece  of  the  matrix  (i.e.,  6k   rows). 

Field  -  a  6k   x  6k   piece  of  the  matrix  (i.e.  the  intersection  of  a 
segment  and  a  tract). 

Order  -  a  1  x  N  matrix  (where  N  is  the  order  of  the  array)  which 
contains  the  ordering  information  from  the  triangular 
decomposition  phase.   (i.e.  order  contains  the  record  of 
all  row  interchanges  made  during  triangular  decomposition 
in  order  to  achieve  partial  pivoting). 
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6.       I/O  ALGORITHMS   FOR  TRIANGULAR  DECOMPOSITION  PHASE 

Because   Illiac   IV  has   a  limited  amount   of  memory,   the  matrices   used  in 
this   routine   are  non-core-contained.      This  means  that    I/O  methods   are   crucial. 
They  must  he  the  most    efficient  possible  to   reduce  the   amount   of  I/O  wait 
time  to   a  minimum.      In  the  triangular  decomposition  phase,   the  unit   of  I/O 
is   a   field.      This    is   the   smallest    amount   of  data  read  into   core  or  "written 
onto   disk  at   any  one  time.      Memory  is   divided  into  buffers  or  6U   rows   each, 
allowing  each  buffer  to   exactly   contain  one   field.      Pointers    are   set   up  to 
each  buffer,    and  a  table   is  kept    for   fields    containing  information  such   as 
whether   it    is   in  memory,   which  buffer  it   is    in,   where  on   disk  it    is   located, 
if  there   is    an  uncompleted  read  or  write   in  progress,   etc. 

In  triangular   decomposition,   the  matrix  is   decomposed  by   segments.      A  segment 
is   read  in  and  updated  by  all  the   columns   of  the  matrix.      In  order  to  reduce 
round-off  errors,    each   field  of  the   segment  to  be  updated  is  processed  using  the 
fields  to  the   left   of  it   in  that   tract   of  L_  before  the  next    field  is   started. 
The  segment   is   then   decomposed  and  written   onto   disk.      In  order  to  make  updating 
tracts  possible,    row  interchanges   must    involve  the   entire  row   (including  elements 
of  both  U_  and  L)    instead  of  only  the  y_  elements   of  the  row  (which  was   done   in 
the  sequential   version.)      Thus,  the  third  phase  of  processing  a  segment    is   to 
read  in  appropriate   fields  of  each   segment  to  the  left    and   interchange  the 
rows   that  were  necessary   for  partial   pivoting  in  the   segment   just   processed. 

In  order  to  maximize  the   chances  that    information  is    in   core  when   it    is 
needed,   the   following  read  algorithm  is    initiated  whenever  Na  field  buffer   is 
freed.      (DONEFLAG    =  non-zero   indicates  that    all   information  needed  to   complete 
triangular  decomposition  has  been  read): 

l)      If  DONEFLAG    non-zero,   go  to   step  12. 
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2)  Test    condition   flag: 

If  CONDITION   all   zeros,    go  to   step   5. 
If  CONDITION  mixed,    go  to   step   3. 
If  CONDITION   all  ones,    go  to   step  9. 

3)  (Read  in  segments  to   do  row  interchanging):      Increment  tract  number 
by  1.    (If  greater  than  maximum  tract   in  matrix,   go  to   step  k .)    Test 
that   field  of  the   segment  being  read.      If  it    is    already   in   core, 
repeat    step   3.   If  not,    read  it,    set   up  proper  pointers,    and  go  to 
step  12. 

h)      Increment   segment   number  by  1.      If  the  result   equals   the  number  of 
the   segment   just   decomposed,   then  go  to   step  10.      Otherwise,    set 
the  tract   number  equal  to  the  number  of  the   segment   just   decomposed 
and  test  that   field  of  the   current    segment.      If  it   is    in   core,   go  to 
step   3.      Otherwise,    read  it,    set   up  proper  pointers,    and  go  to   step  12 

5)  (Read  fields  of  L  in   current  tract   for  updating  new  segment  prior  to 
decomposition):      If  an  unread  field  containing  elements  of  L   cannot 
be   found  in  this  tract,    go  to   step  6.      Otherwise,    read  the  left-most 
such   field,    set   up  proper  pointers,    and  go  to   step  12. 

6)  Increment  the  tract    counter.      If  the  new  counter  is   greater  than  the 
maximum  tract    in  the  matrix,    go  to  step  'J.      Otherwise  repeat   step  5. 

7)  Set    condition  flag  to  mixed   (i.e.,   next   read  will  get   segments    for 
updating  row-  interchanges).      Set   segment    counter  to   0.      Set   tract 
number  equal  to  number  of  segments  to  be   decomposed.      If  the   field 
of  the   segment  to  be   read   (which   is   pointed  to  by  the  tract  number) 
is   in   core,    go  to   step   3.      Otherwise  read  it,   set   up  proper  pointers, 
and  go   to   step  12. 
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8)  (Read    the  next   segment  to  be   decomposed  that    is  not   already   in   core): 
Increment  tract    counter.      If  it   is   greater  than  the  maximum  tract 

in  the  matrix,    go  to   step  9-      Otherwise,    read  that   field  of  the 
segment,    set   up  proper  pointers,    and  go  to   step  12. 

9)  Set   condition  flag  to   all  zeros    (i.e.,   next   read  "will  begin  reading 
tracts  of  L_  to  update   segment   to  be   decomposed).      Set  tract   number 
to   0.      If  first    field  of  first  tract   is   in   core,   go  to  step  5. 
Else   read  it,    set   up  proper  pointers,    and  go  to   step  12. 

10)  If  all   segments  yet   to  be   decomposed  have  not  been  read,    go  to   step 
11.      If  all  tracts  needed  to  update  the   segment  to  be   decomposed 
next   have  not  been  read,    go  to   step  9.      Otherwise,   set    DONEFLAG  to 
non-zero   and  go  to   step  12. 

11 )  Set    condition   flag  to   all  ones    (i.e.    next  read  will  begin  reading  a 
segment  to  be   decomposed).      Set   tract  number  to  0.      Increment    counter 
for  segments  to  be   decomposed.      Read  first    field  of  segment,   set   up 
proper  pointers,   and  go  to   step  12. 

12)  End  read  routine. 

The  output  algorithm  is  equally  important.   Fields  are  only  written  onto 
disk  when  necessary.   Thus,  the  following  rules  cover  output  procedures: 

1)  While  computing  the  decomposition  of  any  segment,  write  each  field 
onto  disk  as  soon  as  it  is  completed. 

2)  While  making  row  interchanges  on  segments  previously  decomposed, 
write  any  fields  that  were  changed  onto  disk  when  all  interchanges 
are  complete  on  the  segment. 
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The  last  important  consideration  for  I/O  is  the  algorithm  used  to  free 
buffers.   The  rules  governing  this  phase  are  as  follows: 

1)  During  the  update  phase  (updating  segment  to  be  decomposed  by 
previously  calculated  segments),  fields  of  L  used  in  the  updating 
should  be  freed  as  soon  as  they  are  used. 

2)  During  the  row  interchange  phase  (updating  previously  calculated 
segments  by  segment  just  decomposed)  free  unchanged  fields  of  L_ 
as  soon  as  all  interchanges  are  made  to  the  segment. 

3)  Free  fields  of  the  segment  being  decomposed  only  after  all  compu- 
tations have  been  completed. 

h)      Free  any  field  being  ouput  as  soon  as  the  write  is  completed. 
5)  Whenever  a  buffer  is  freed,  initiate  a  new  read  into  that  buffer 
immediately. 
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7.      TRIDEC   -   THE  TRIANGULAR   DECOMPOSITION  PHASE 

TRIDEC   is    an  ASK  subroutine  which  performs   the  triangular  decomposition 
phase  of  Gaussian  Elimination  in  a  parallel  manner  on   Illiac   IV.      In  the 
following  discussion,  whenever  operations   on  rows   or  columns   of  the  matrix 
are  referred  to,   the   algorithm  will   perform  the  operation   iteratively  on 
up  to  6k   elements  of  the  row  or  column  simultaneously. 

L  and  U  are   formed  column-wise  working  from  left  to   right  through  the 
matrix.      Since  the  matrix  is   non-core-contained,   only  one   segment   of  the 
matrix  is    assumed  to  he  in   core   at    any   given  time.      The   decomposition  must 
therefore  occur  in  three   steps: 

l)      The   update  phase  -  For  each   segment   S_  after  the  leftmost   one,   the 
following  steps  must  be  taken  to  bring  the   data  up  to   date;    i.e., 
the  same  operations  must  be  performed  on  segments   S_  as  have  been 
previously  performed  on  segments   0,   1,    ...,   S_-l. 

a)  Make   all  row  interchanges   in   segment   S_  that  have  previously 
been  made   in  earlier  segments    in   order  to   accomplish  partial 
pivoting,    i.e.,    for  i  =  0,   1,    ...,   S_®6U-1   interchange  row  i 
with  row      0RDER( i)  in  the  segment. 

b)  Let  I  indicate  which  field  of  segment  S_  is  being  updated.  Let 
J  represent  the  total  number  of  tracts  in  the  matrix  (counting 
from  0).      For  1=0,1,    . . . ,  J   do  the   following: 

Let   K  be  the  number  of  fields   of  L  which   are  to  the  left   of 
segment   S    (i.e.,    for  tracts   0,   1,    ...,   S-l ,  we  have  k  =  0,1,...,  S-l. 
For  tracts   S,    ...,   j  we  have  k  =   S-l).      Let   L  represent  which   field 
of  tract    I   is  being  updated  by.      Then   for  L  =  o,    1,    .. .,    ^  do 
the   following: 
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Let    Q  indicate  which   column  of  field    L  is  being  used. 
Let   R  indicate  which   column  of  segment   S_  is  being 
updated.      For  R    =0,    1,    ...,   63  do: 

For  Q   =  0,    1,    ...,    63  multiply  element  Q   of  field  L 

of  segment   S_  by   column  Q    of  field  L    of  tract  I . 

Subtract   the   result   from  column  R   of  field  I    of 

segment   S. 

2)  Decomposition  Phase   -  The  U  elements   in   fields   0,   1,    ...,   S-l   are  now 
computed.      If  only  remains    for  us  to   compute  the  elements  of  L_  and  the 
few  elements   of  U  located  in   field  S.      Let    i   represent  the   columns   of 
S_.      For   i  =  0,    1,    .. ..,    63  do  the   following: 

a)  Find  the  maximum  magnitude   among  elements   6k  (g>  S+i  through  m  of 
column  i.      Set  order   (6k  @S+i)   to  the  row  number  of  that  maximum 
element.      Interchange   rows   6k  (K)  S+i   and  order   (6k  ©S+i). 

b)  Divide  the  subdiagonal   elements   of  column  i  by  the  new  diagonal  elemenl 

c)  Let   j    represent   the   columns      in     S_  to  the  right   of  column   i.      For 

j    =   i+1,    ...,    63,   multiply  the   subdiagonal  elements  of  column   i  by 
element   6k  Q  S+i   of  column  j.      Subtract  the  product    from  the   corre- 
sponding elements   of  column   j. 

3)  Row  Interchange  Phase   -  In  order  to   update  new  segments  tract-wise,    all 
elements    in  rows  must  be   interchanged.      The  newly -completed  segment    is 
being  written  onto   disk.      At  this  time,   previous   segments    are  updated. 
Let    I  indicate  the   segment  being  updated.      Read  in  only   fields   S,   S+l , 
...,    J  of  each  segment.      For   1=0,    ...,   S-l   do  the   following: 

Let   k  indicate  the   row  interchanges  necessary.    For  k=   S  ©  6k ,    ..., 
(S+l)  ©  6k-l   interchange   row  k  and  row  ORDER(k)     of  segment    I. 
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8.       I/O  ALGORITHMS   FOR  BACK  SUBSTITUTION  PHASE 

As  previously  stated  in  the  triangular  decomposition   discussion, having 
a  non-core-contained  matrix  forces  the   I/O  algorithms  to   assume   a  prominent 
position   in  the   algorithm  being  coded.      The  back   substitution  phase  poses 
less   problems  than  the   decomposition  phase,  but  I/O  still  maintains   a  position 
of  importance.      Back  substitution  is    done   in  two   steps   -  manipulate   a  set   of 
right-hand  sides  to   correspond  to  the   decomposed  matrix   (Ly  =  b),    and  perform 
back  substitution  to   get    a  solution  to  the  matrix   (Ux  =  y).    The  read  algorithm 
below  assures  that   input  wait  time   is   minimal: 

1)  If  DONEFLAG  non-zero,   then  go  to   step  9-      Otherwise  go  to  step  2. 

2)  If  last   input  was    a  field  from  U,    go  to   step  6.      Otherwise   go  to 
step   3. 

3)  If  no  unread  field  of  L_  exists    in   current  tract,   go  to   step  k. 

Else   read  first   such   field,    set   up  proper  pointers,    and  go  to   step  9- 
k)      If  current  tract    is   last   tract    in  matrix,    go  to   step   5.      Otherwise 
increment  tract    counter  by  1  and  go  to   step   3. 

5)  Set    flag  to   indicate   reading  fields  of  U_.      Decrement  tract    counter 
by  1   and  go  to   step  6. 

6)  If  no  unread   field  appears   in   current  tract,    go  to   step    7.      Else 
read  first   such   field,    set   up  proper  pointers,   and  go  to   step  9. 

7)  If  current  tract   is  tract   0,    go  to  step  8.      Else  decrement  tract 
counter  by  1  and  go  to   step  6. 

8)  Set   DONEFLAG  non-zero.      Go  to   step  9- 

9 )  End  read  algorithm. 

Fields   of  L  and  U  are  never  output.      The   right-hand  sides   of  the  matrix 
are   read  in  segments  of  6k   columns   each.      The   right-hand  side   is   output  when 
the  back  substitution  is    complete. 
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New  fields   are  read  whenever  a  buffer  is   freed.      The  rules   governing  the 
freeing  of  buffers    are   as    follows : 

1)  All   subdiagonal   fields   of  L  are   freed  as   soon   as  they  have  been  used. 

2)  The   diagonal   fields   of  L_  are   freed  as   follows: 

a)  If  the  matrix  has  10  tracts   or  less,   do  not    free  the   diagonals 
of  L. 

b)  If  the  matrix  has  between  11   and  17  tracts,    free  all  but  the 
last    four  diagonals. 

c)  If  the  matrix  has  more  than  17  tracts,    free   all  but  the  last 
diagonal. 

3)  Free   all   fields  of  U  immediately  after  use. 
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9.      GAUSS   -   THE  BACK  SUBSTITUTION  PHASE 

GAUSS   is    an  ASK  subroutine  which  performs  the  back  substitution  phase  of 
the  Gaussian  Elimination  process   in   a  parallel  manner  on   Illiac   IV.      It 
assumes  that  the  matrix  to  be   solved  has    already  been  decomposed  into  L_  and 
U  by  TRIDEC . 

The  solution  of  the  matrix  is   computed  from  L  and  U  using  the  method  of 
back  substitution   described  above   for  Gaussian  Elimination.      Any  number  of 
solutions  may  be   found  --one  for  each   right-hand  side  provided  by  the  user. 
Up  to   6k   solutions    can  be   found  at   one  time.      The   results   are   obtained  by 
working  from  bottom  to  top  through  the  matrix.      Elements   of  U  are   read  in 
tract-wise   from  bottom  to  top.      To  minimize  round-off  error,   only  one  field 
at    a  time  of  the   right-hand  side   is    involved  with  the   calculation. 

The  solution   is   found  in  two   steps.      First,  the  right-hand  side  must 
undergo  the   same  transformation  as  the  original  matrix  did  during  the   decompo- 
sition process   described  above.      This   is    done  by  solving  LJ  =  b   for  y. 
Second,   the  back  substitution  mentioned  above   is    carried  out   in  order  to 
find  the   solution.      This    is    done  by  solving  Ux  =  y   for  x.      These   steps   are 
described  below: 

l)      Update   right-hand  sides:      In  order  to  perform  back  substitution,  the 
right-hand  sides  must  be  permuted  in  the   same  way  that  the  matrix  was 
permuted  in   forming  U.      The   information  necessary  to   do  this  permuta- 
tion  is    contained  in  L_  and  ORDER.      The  permuting  of  the  right-hand 
sides  b   is   done   in  two  phases.      Let   n   represent   the  number  of  rows 
in  b.      Hence  n-1   is  the   index  of  the  last   row  of  b.       (Numbering 
begins   at    zero) : 

a)      For  1=0,    . ..,   n-1   interchange  row  I  of  b  with  row     ORDER(l)   of 
b.      This    aligns   the  rows    in  the   same  order  as  the   rows   in  L   and  U. 
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b)  Solve  Ly  =  b  for  y.  This  is  done  by  tracts  of  L.  Let  m  be 
the  number  of  the  last  tract  of  L:  m  =  (n-l)/61+.  Let  I  be 
the  tract    counter.      For  1=0,    . . . ,   m  do  the   following: 

Let   J   indicate   fields  within  tract    I.      For  J  =  0,    ...,    I 
do  the   following: 

Let    K  indicate  the   columns  within   field  J  of  tract   I  of 
L.      For   K  =  0,    ...,    63,    do  the   following: 

Let   P  indicate  the   columns   of  field  I  of  the   segment 
of  right-hand  sides   for  which   a  solution   is    currently 
being  found.      For  1  =  0,    ...,   Q   (where  Q  =  number  of 
columns   in  the   segment),    do  the   following: 

Multiply   column   K  of  field  J  of  tract    I  of  L 
by  element   P  of  row  K  of  field  J  of  the   segment 
of  right-hand  sides   and  subtract  the  products    from 
column  P  of  field  J  of  the   segment   of  right-hand 
sides . 
Back   Substitution:      The   solution   is    found  by  performing  back   substitu- 
tion using  U  and  the   array  of  permuted  right-hand  sides.      It    is 
computed  tract-wise   from  bottom-to-top  using  the  back  substitution 
algorithm  described  below.      There   are  two   steps   necessary   in  perform- 
ing back   substitution  because  the  matrix  is   not    core-contained. 
For  tracts   I  =  m,    ...,    0  of  U,    do  the   following: 
a)      Update  Phase:      The   current   field  of  right-hand  sides  must  be 
updated  by  that   part   of  the   solution   already   computed.      Let   S 
indicate   fields  of  tract    I.      For  J  =   1+1 ,    ...,   m  do  the   following: 
Let   K  indicate  the   columns  within   field  J  of  tract    I  of  U. 
For  K  =  0,    ...,    63,    do  the   following: 
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Let   P  indicate  the   columns   of  field  I  of  the  segment 
of  right-hand  sides   for  which   a  solution   is  being 
computed.      For  P  =  0,    ...',   Q   (where  Q  is   the  number  of 
columns   in  the  segment),    do  the   following: 

Multiply   column   K  of  field  J  of  tract   I  of  U  by 
element    K  of  column  P  of  field  J  of  the   segment   of 
right-hand  sides   and  subtract  the  products   from 
column  P  of   field  I  of  the   segment   of  right-hand 
sides, 
b)      Computation  Phase:      Let   J   indicate  the   columns   of  field  I  of 
tract    I  of  U.      For  J  =   63,    ...,    0,   do  the   following: 

Divide  row  J  of  field  I  of  the   segment   of  right-hand  sides 
by  element   J  of  column  J  of  field  I  of  tract    I  of  U.      Let 
K  indicate   columns   of  field  I  of  the  segment  of  right-hand 
sides.      For    K=  0,    ...,   Q   (where  Q  is   the  number  of  columns 
in  the   segment),    do  the   following: 

Multiply  the   superdiagonal  elements  of  column  J  of 
field  I  of  tract   I  of  U  by  element   J  of  column    K  of 
field  I  of  the   segment   of  right-hand  sides   and  subtract 
the  products    from  the   corresponding  elements   of  column 
K    of  field  I  of  the   segment   of  right-hand  sides. 
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10 .      USE_0F  THE   GAUSSIAN  ELIMINATION   PACKAGE 
Triangular  Decomposition: 

The  triangular   decomposition   routine    (named  TRIDEC)    is   an    ASK  subroutine 
(called  in  the   standard  fashion)  with  these  parameters: 

MATRIX  -   an  m  x  m   (square)   matrix   (m  *  n).      On  entrance,    it    contains  the 
matrix  to  he   decomposed.      The  matrix  is   expected  to  he  padded  on 
the  bottom  and  to  the  right  with   rows   and  columns  of  zeros   so  that 
m  is    a  multiple  of  6k.    (This   is    important    in  simplifying  the   coding 
so  that   execution  time  may  be   cut).      It   is   expected  that  the  matrix 
will  be  located  on   disk.      It    is    further  expected  that  the  matrix 
was   read  onto   disk  in  the   following  manner: 

Store  the  matrix  tract-wise  by  fields   and  store   fields  by 
rows,   i.e.,   let   Q  be  the  number  of  the  last  tract   in  the 
matrix   (Q  =    (n-l)M).      For  tracts  0,    ....   Q  do  the  following: 

Let    I   indicate  which   field  in  the  tract    is  being  written. 
Let   J  indicate  which  row  of  field  I   is  being  written. 
For   I  =  0,    ... ,    Q,    do  the   following: 

For  J  =  0,    ...,63,  write  row  J  onto  disk. 
The   actual  parameter  passed  is  the   disk  reference  of  the  matrix. 
Upon   completion  of  the   algorithm,   the  matrix  will  have  been  over- 
written  and  will   contain  L  and  U,   i.e.   the   decomposition  of  the 

matrix. 

N  -  The  order  of  the  matrix  to  be   decomposed   (before  padding). 

EPS  -  The   desired  cut-off  value   for  determining  the   singularity  of  the 

matrix.      If,    after  partial  pivoting  is   done,   the  pivot   element 
has    a  magnitude  less  than  EPS,   then  the  matrix  is    considered  to 
be  singular  and  computation   ceases.      It   should  be  noted  that   since 
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the   absolute  magnitude  of  the  pivot    is   used,   an  EPS   should  be 
chosen  which   reflects  the  magnitude  of  the   elements   in  the 

matrix.      (e.g.,    an  EPS  of  10   '    may  work   fine  with   a  matrix 

3 
having  elements   of  magnitude  10    ,  but  would  be  worthless   in   a 

matrix  having  elements   of  magnitude  10      ). 
ORDER     -  A  1   x  m  array  which  will  keep  track  of  all  the  row  interchanges 

necessary  to   achieve  partial  pivoting.      The   actual  parameter 

passed  is    a  reference  to  the   disk  location  where  the   data  is 

stored. 
FAIL        -  A  label  to  which   control   is   passed  if  the   input   matrix  turns 

out  to  be  singular. 


Back  Substitution: 

The  back  substitution  routine    (named  GAUSS )    is   an  ASK  routine    (called  in 
the   standard  fashion)  with  these  parameters: 

MATRIX  -  The  m  x  m   (square)   matrix  containing  L_  and  U  which   is  the  output 
of  TRIDEC.      The   actual   parameter  passed  is   the   disk  reference 
of  the  matrix. 
N  -  The   same  N  used  by  TRIDEC. 

RIGHTSIDE  -  An  M  x  Q  array   containing  P   (P  <   Q)    right-hand  sides    for  which 
solutions   are  to  be   found.      It    is   padded  with   columns    and  rows 
of  zeros    (as  was   MATRIX)    so  that   Q  is    a  multiple  of  6k.      It   is 
located  on   disk  stored  segment-wise    (i.e.,    for  segments   0,    ...,   R 
(R  =    (P-l)/6U)    store  fields   0,    ...,   S    (S  =    (N-1/6U)   onto   disk  by 
rows).      The   disk  reference  is  the  parameter  passed  to  GAUSS . 
Upon   completion  of  the   subroutine,   RIGHTSIDE  will  have  been  over- 
written and  will   contain  the   solution  vectors   x^ ,    .....   x   . 

~0  ~P 


2k 


P_  -  The  number  of  right-hand  sides    for  which   solutions   are   desired 

(1    <  P   <  2^   -l). 
ORDER     -  The   disk  reference      of     the  ORDER  matrix  generated  by  TRIDEC . 


Not  e s   Ab  out   Us  age : 

TRIDEC  and  GAUSS  are  designed  to  handle  large,  general  matrices. 
(Matrices  of  order  N  <  l66k   are  accepted).   The  following  comments  should 
be  considered  before  using  these  routines: 

1)  If  your  matrix  is  of  order  N  <  6U,  do  not  use  these  routines  as 
they  are  not  efficient  for  arrays  of  that  size.   A  Gaussian  Elimina- 
tion routine  is  available  which  accepts  only  matrices  of  order  N  <  6k. 

2)  If  your  matrix  is  of  a  special  type  (symmetric,  sparse,  etc.)  there 
is  probably  a  solution  routine  available  for  it  which 
can  take  advantage  of  its  special  properties.   GAUSS  and  TRIDEC 
cannot  do  this. 

3)  The  algorithms  are  most  efficient  for  matrices  having  an  order  N  <  600, 
They  will  accept  matrices  up  to  N  =  l66k ,  but  the  closer  one  gets  to 
that  limit,  the  more  time  will  be  spent  doing  I/O.   By  the  time  l66k 
rows  are  reached,  most  of  the  execution  time  will  be  spent  in  an 
I/O  wait  state.   Thus  a  large  matrix  will  need  an  enormous  amount  of 
time  to  execute. 
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